Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature(MInference): add triton-based decoding in case flash_attn is not available #35

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

liyucheng09
Copy link
Contributor

@liyucheng09 liyucheng09 commented Jul 13, 2024

What does this PR do?

Feature

  • Add triton-based decoding for HF mode, in case flash_attn is not available.
  • The vLLM mode stay the same as it wouldn't require flash_attn during decoding.

Bug Fixed

UnitTest

  • Passed in Local

Who can review?

@iofu728

@iofu728 iofu728 changed the title add triton-based decoding in case flash_attn is not available Feature(MInference): add triton-based decoding in case flash_attn is not available Jul 15, 2024
@iofu728 iofu728 merged commit 50d17d9 into main Jul 15, 2024
1 check passed
@iofu728 iofu728 deleted the decoding-dev branch July 15, 2024 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question]: Is A6000 supported?
2 participants